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Abstract 

We  present  a  novel  approach  to  the  challenging  issue 
of  database  confidential  data  protection.  We  adopt  the 
decision  tree  framework  as  our  baseline  and  extend  it 
to  cope  with  databases  where  the  classdabel  attribute 
is  not  specified.  We  are  interested  in  confidential  data 
that  are  randomly  distributed  over  different  attributes 
(referred  to  as  multi- dimensional  inference).  For  con¬ 
fidential  data  protection,  our  method  (referred  to  as 
adaptive  modification )  mitigates  inference  by  evaluat¬ 
ing  and  modifying  some,  not  all,  relevant  data  records. 
We  localize  data  modification  in  a  decision  tree  and, 
instead  of  exhaustively  evaluating  all  modification  pos¬ 
sibilities,  we  select  informative  data  to  modify.  Our 
proposed  method  is  effective  in  protection  of  confiden¬ 
tial  data  and  scalable  for  handling  large  databases. 

1  INTRODUCTION 

Safeguarding  confidential  data  of  a  database  has  been 
a  challenging  issue  in  the  past  and  emerges  as  one  of 
the  most  critical  information  technologies  today.  The 
pressing  demand  for  such  a  protection  technique  is  partly 
due  to  the  trend  of  information  sharing  between  insti¬ 
tutions  and  among  coalition  members,  and  the  open¬ 
ing  of  the  government  databases  to  the  public.  The 
problem  that  arises  when  confidential  information  can 
be  derived  from  released  data  by  unauthorized  users  is 
commonly  called  the  database  inference  problem. 

Many  of  today’s  efforts  in  confidentiality  protec¬ 
tion  have  been  geared  towards  modifying  to-be-released 
data  in  order  to  mitigate  inference.  Methods  of  modi¬ 
fication  include  perturbation  ([4])  (i.e.,  alteration  of  an 


attribute  value  to  a  new  value),  blocking  ( [2] [13])  (i.e., 
replacement  of  an  existing  attribute  value  with  a  “?” 
indicating  ignorance),  and  aggregation  (i.e.,  combina¬ 
tion  of  several  values  into  one  coarser  category)  ([14]). 
These  modifications  are  made  on  the  basis  of  a  prob¬ 
abilistic  model  ( [2] [5] ) ,  decision  tree  ([1]),  association 
rules  ([6]  [7])  or  the  rough  set  theory  ([10]). 

Our  goal  is  to  lay  a  sound  theoretical  foundation  for 
confidential  data  protection.  In  this  paper,  we  develop 
inference  prevention  methods  on  the  basis  of  a  deci¬ 
sion  tree  framework  ([12]).  The  decision  tree  method 
conveniently  provides  a  more  localized  description  of 
data  records.  The  structure  of  the  tree  may  easily  be 
traced  back  to  individual  instances,  and  the  effect  of  the 
modification  of  particular  instances  on  decision  making 
is  more  clear.  It  also  delivers  excellent  performance 
against  many  benchmark  test  datasets  ([12]).  In  [1], 
we  applied  the  decision  tree  method  as  our  baseline 
approach  to  the  inference  problem,  where  confidential 
data  were  represented  as  values  of  the  classdabel  (at¬ 
tribute)  of  the  test  data.  However,  confidential  data 
may  be  composed  of  data  from  different  sources  and 
may  not  be  restricted  to  one  attribute  (i.e.,  the  classdabel). 
It  is  cases  in  which  confidential  data  are  distributed 
over  the  entire  database  (referred  to  here  as  multi¬ 
dimensional  inference )  that  interest  us.  In  this  paper, 
we  extend  the  decision  tree  method  in  order  to  handle 
distributed  confidential  data. 

Decision  theoretical-based  approaches  often  suffer 
from  the  inability  to  scale-up  to  cope  with  large  databases. 
What  limits  these  approaches  the  most  is  not  the  in¬ 
tricate  decision  analysis  required,  but  the  exhaustive 
evaluation  of  the  entire  databases  in  a  repeated  manner 
during  the  modification  process.  Our  approach  adopts 
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an  adaptive  modification  strategy  which  gives  effective 
performance  and  desirable  results. 

2  INFERENCE  PROBLEM 

We  consider  a  simple  two-leveled  security  protocol  ([8]) 
which  has  High  and  Low  users.  The  High  users  (e.g., 
the  database  manager)  view  the  entire  database,  and 
the  Low  users  share  the  High  view  with  the  exception 
of  any  confidential  data.  When  data  are  shared,  High 
releases  some  of  the  non-confidential  data  to  Low. 

Authors  of  ([11])  have  introduced  a  conceptual  model 
for  database  inference  and  discussed  the  necessary  steps 
involved  in  dealing  with  the  inference  problem.  High 
generates  rules  from  the  available  data  set,  and  then 
determines  whether  there  is  inference.  If  the  inference 
is  excessive,  then  it  implements  a  protection  plan  to 
lessen  the  inference  (i.e.,  decides  to  modify  by  deleting 
certain  data  from  the  database  as  it  appears  to  Low). 
In  fact,  many  database  inference  papers  have  alluded 
to  our  inference  model.  The  output  of  our  inference 
model  is  the  database  that  can  be  released  to  Low. 
Our  goal  is  to  make  modifications  as  parsimoniously  as 
possible  and  thus  avoid  imposing  unnecessary  changes 
which  lessen  functionality. 

2.1  Decision  Tree  Method 

Our  analysis  of  data  protection  is  based  on  C4.5  deci¬ 
sion  tree  ([12]).  The  C4.5  decision  tree  uses  an  infor¬ 
mation  theoretic  test  to  evaluate  the  quality  of  decision 
tree  generation.  It  classifies  a  new  data  record  by  as¬ 
signing  it  the  class  label  possessed  by  the  majority  of 
data  records  that  are  at  the  same  leaf  node  (i.e.,  the 
end  of  a  branch  where  a  class  label  is  assigned)  of  the 
decision  tree  as  the  new  data  record.  By  convention, 
the  attribute  used  as  the  classJabel  is  deterministic. 
To  deal  with  multi-dimensional  inference,  we  evaluate 
the  possibility  of  inference  on  an  arbitrary  attribute  by 
designating  it  as  the  classJabel  (thereby  the  original 
classJabel  becomes  an  ordinary  attribute.) 

In  our  method,  attributes  that  contain  confiden¬ 
tial  data  are  viewed  as  the  class  labels  of  the  testing 
data,  and  the  remainder  of  the  database  is  considered 
non-confidential.  In  [1],  the  database  inference  prob¬ 
lem  was  viewed  as  traditional  decision  tree  learning, 
and  the  prevention  of  database  inference  dealt  exclu¬ 
sively  with  attribute  values  of  the  training  data  (which 


Table  1:  Relational  Table  for  Evaluation.  Aj  denotes 
the  jth  attribute  and  the  “?”  denotes  an  unknown 
value,  a  piece  of  confidential  datum,  or  a  previously 
modified  value. _ 
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are  only  part  of  the  non-confidential  data).  Confiden¬ 
tial  data  were  associated  with  only  one  attribute  (i.e., 
the  classJabel).  Inference  prevention  that  structured 
in  this  way  is  clearly  insufficient.  In  this  paper,  we 
take  into  account  the  entire  body  of  non-confidential 
data 

2.2  Metric 

Table  1  shows  an  instant  view  of  a  relational  data  table. 
Modification  results  in  placing  perhaps  more  “?”s  in  the 
database.  At  this  instant,  one  of  the  M  attributes  has 
been  selected  as  the  classJabel.  Data  modification  is 
likely  to  incur  degradation  of  database  performance.  In 
our  approach,  important  performance  metrics  include 
the  effectiveness  measure  of  confidential  data  protec¬ 
tion  (E)  and  the  measure  of  the  loss  of  functionality 
(F)  in  a  database.  In  terms  of  the  decision  tree  method, 
the  effectiveness  measure  for  the  attribute  currently  se¬ 
lected  as  the  classJabel  is  determined  by  the  classifica¬ 
tion  error  of  the  test  data  (i.e.  the  confidential  data), 
while  the  measure  of  loss  of  functionality  is  a  function 
of  the  classification  error  of  the  training  data  (i.e.  the 
to-be-released  data). 

Suppose  the  jth  attribute  is  posted  as  the  classJabel. 
Let  the  measure  of  protection  effectiveness  with  respect 
to  the  jth  attribute  be  denoted  as  Ej  and  the  measure 
of  the  loss  of  functionality  be  denoted  as  Fj .  The  over¬ 
all  measure  of  E  and  F  for  the  entire  database  are  the 
function  (e.g., weighted  average)  of  EjS  and  Ft s.  The 
measure  F  is  usually  has  an  upper  bound  of  a  given 
threshold  v  (i.e.,  F  <  v)  that  represents  the  maxi¬ 
mum  level  of  information  loss  that  users  are  willing  to 
tolerate.  With  the  definitions  of  E  and  F  in  mind,  our 
optimization  goal  is  to  ([3]) 

Minimize  E,  while  keeping  F  <v, 
i.e.,  we  optimize  E  with  F  as  the  objective  function. 


Note  that  the  effect  of  protection  is  evaluated  from 
High’s  perspective,  while  the  database  functionality  is 
evaluated  from  Low’s  view. 


3  ADAPTIVE  MODIFICATION 

Our  adaptive  strategy  exploits  the  property  of  local¬ 
ization  inherent  to  the  decision  tree  method  and  mod¬ 
ifies  not  all  attribute  values,  but  rather  only  selected 
ones.  We  examine  the  leaf  nodes  of  a  decision  tree  and 
study  the  statistics  of  the  data  records  at  different  leaf 
nodes.  By  restricting  modification  to  a  small  area  of  the 
database,  our  approach  preserves  the  database  func¬ 
tionality.  During  modification,  we  visit  the  attribute 
that  contains  the  largest  number  of  confidential  data 
records  and  receives  the  lowest  classification  error  (i.e., 
the  highest  inference  threat).  With  this  selected  at¬ 
tribute  in  mind,  we  examine  the  distribution  of  asso¬ 
ciated  confidential  data  and  modify  the  leaf  node  that 
has  the  highest  population.  Clearly,  our  search  strategy 
may  not  yield  an  overall  optimal  solution.  However,  the 
controlled  modification  scheme  can  effectively  avoid  the 
high  computational  complexity  incurred  by  exhaustive 
search.1  We  will  describe  our  modification  procedures 
at  three  levels  in  the  following  sections. 

3.1  Selection  of  Attributes 

Let  the  total  number  of  confidential  attribute  values  be 
denoted  as  S,  the  number  of  total  attribute  values  be 
D,  and  the  classification  error  of  the  test  data  with  re¬ 
spect  to  the  jth  attribute  be  Cerrj.  At  the  first  level, 
we  select  from  all  the  M  attributes  the  one  that  max¬ 
imizes  the  product  of  the  number  of  associated  confi¬ 
dential  data  records  and  the  inverse  of  the  classification 
error 


j  . 

Uattr  =  max  (1  -  Cerrj)(— ^-) 

where  T Ej  is  the  number  test  data  associated  with  the 
jth  attribute.  Suppose  attribute  Aj  is  selected.  Thus, 
Aj  is  posted  as  the  classdabel  and  denoted  as  Cj . 

3.2  Selection  of  A  Leaf  Node 

For  the  given  Cj ,  we  decide  among  all  leaf  nodes  from 
the  corresponding  decision  tree,  DTj,  a  leaf  node  to 

1  Exhaustive  search  means  the  evaluation  of  every  possible 
batch  of  attribute  values  of  non-confidential  data. 


visit.  The  selection  of  the  leaf  node  is  determined,  at 
least,  by  (1)  the  number  of  correctly  labeled  training 
data  records  at  a  leaf,  and  (2)  the  number  of  correctly 
classified  test  data  records  at  a  leaf.  Less  correctly  la¬ 
beled  training  data  implies  that  less  effort  is  required  to 
alter  the  present  class  label  at  a  leaf  node.  On  the  other 
hand,  the  more  correctly  classified  test  records  are,  the 
higher  the  effect  in  protection  is  from  modifying  the 
leaf  node.  Let  the  selected  leaf  node  be  denoted  as  L k 
and  the  immediate  attribute  (i.e.,  the  last  attribute  on 
the  branch)  of  Lk  be  Aj.  With  Lk  and  Cj  in  mind,  we 
record  the  following  statistical  information: 

•  Rj:  training  records  w.r.t.  Cj 

•  Ej:  testing  records  w.r.t.  Cj 

•  Tr:  correctly  labeled  training  records  at  L k 

•  Te:  correctly  classified  testing  records  at  Lk 

•  Fr:  incorrectly  labeled  training  records  at  L k 

•  Fe:  incorrectly  classified  testing  records  at  L k 


Figure  1:  Lk  is  the  leaf  node  and  A,  is  the  immediate 
attribute.  Te,  Tr,  Fe  and  Fr  denote  the  statistical  infor¬ 
mation  of  the  data  records  associated  with  Lk.  Assume 
that  at  Lk  class-label  is  “+”• 


With  these  statistical  information,  the  utility  func¬ 
tion  that  combines  the  above  two  factors  is  as  follows: 


Ui 


eaf  — 


Te  Tr  —  Fr 

'  D  i  771  '  ^ 


Ej  Rj  +  Ej 

We  need  information  about  the  data  distribution  of 
the  neighbors  (i.e.,  leaf  nodes  of  different  values  of  A,;) 
of  Lk .  At  Aj ,  we  store  the  relationship  of  data  records 
to  its  fellow  leaf  nodes  as  the  ratio  al  :  a2  :  ...  :  al, 
for  l  different  values  of  Aj.  Furthermore,  those  leaf 
nodes  with  the  same  class  label  as  Lk  will  be  collectively- 
denoted  as  a+  and  those  with  different  labels  be  a._ . 


3.3  Selection  of  Modification  Methods 

At  the  leaf  node  L * ,  our  strategy  to  mitigate  inference 
involve  two  aspects: 

(SI)  Reduce  the  correctly  classified  test  records. 

(S2)  Reduce  the  correctly  labeled  training  records. 

The  result  of  (SI)  is  expected  to  produce  higher  classi¬ 
fication  error  of  the  test  data,  while  the  result  of  (S2) 
may  cause  the  change  of  the  value  of  the  classdabel  at 
Lk  and  thus,  affect  the  outcome  of  decision  analysis. 
To  implement  our  strategy,  we  envision  the  following 
three  possible  ways: 

•  (II)  Modify  attribute  values  of  correctly  classified 
test  records. 

•  (12)  Remove  the  value  of  classdabel  of  correct 
training  records. 

•  (13)  Modify  attribute  values  correctly  labeled  train¬ 
ing  records. 

For  the  purpose  of  minimizing  the  impact  of  mod¬ 
ification,  in  both  II  and  13,  we  localize  changes  by  re¬ 
placing  only  those  values  of  the  immediate  attribute 
(i.e. ,  Aj)  with  “?”s.2  In  12,  the  values  of  the  classdabel 
of  some  training  data  records  are  blocked.  As  a  con¬ 
sequence,  a  training  data  record  with  its  class  label 
being  blocked  will  be  excluded  from  the  training  data 
set.  Item  (S2)  is  carried  out  by  implementing  12  and  13, 
while  item  (SI)  consists  only  of  II.  In  both  II  and  13, 
we  increase  the  uncertainty  of  classification  by  moving 
(or,  redistributing)  correct  testing  and  training  data 
records  to  neighboring  leaf  nodes.  In  both  12  and  13, 
the  effort  is  to  make  the  number  of  incorrectly  labeled 
training  records  to  outnumber  incorrectly  labeled  ones. 
The  difference  between  12  and  13  is  that  the  effect  of 
13  depends  upon  the  distribution  of  data  records  in  the 
neighborhood  of  L Unlike  12,  II  and  13  are  intended 
to  ’smear’  data  records  of  neighbor  leaf  nodes. 

The  choice  between  the  three  modification  methods 
will  depend  on  an  estimation  of  the  computational  cost 
and  gain  in  confidentiality,  where  the  cost  refers  to  the 
total  number  of  modifications  executed  and  the  gain 
refers  to  the  number  of  data  records  whose  class  labels 
have  been  successfully  altered. 


2In  decision  tree  analysis,  a  data  record  with  “?”s  at  its  at¬ 
tribute  values  is  called  the  uncertain  evidence.  Suppose  the  at¬ 
tribute  value  of  hth  attribute  is  “?”  and  the  /ith  attribute  is 
used  in  its  classification  path.  Then  the  impact  (or  weight)  of 
this  data  record  is  split  among  the  group  of  leaf  nodes  under  the 
hth  attribute  according  to  the  population  distribution. 


11.  The  cost  of  II  is  Te,  for  all  correctly  classified 

test  records  will  be  modified.  On  the  other  hand,  the 
gain  is  — — Te,  because  those  — — Te  data  records 
that  used  to  be  correct  now  become  incorrect,  where 
— ^p —  is  the  ratio  of  the  number  of  re-distributed  test 
records  will  receive  incorrect  label.  Summing  the  loss 
and  the  gain  yields  the  net  loss  of  II,  which  is  Te. 

12.  For  12,  the  condition  of  applicability  is  that 
Tr  and  Fr  are  close.  Because  the  removal  of  the  class 
label  of  a  training  record  results  in  deletion,  the  cost  of 
12  is  Tr  —  Fr  +  1,  meaning  that  the  amount  of  removal  is 
determined  by  the  difference  between  Tr  and  Fr.  After 
deletion,  the  associated  class  label  of  L *  will  change. 
This  means  there  are  Te  test  records  that  will  change  to 
a  wrong  class  sign  and  Fe  test  records  that  will  change 
to  the  correct  one.  So,  the  gain  is  Te  —  Fe.  (Note  that 
if  Te  <  Fe,  modification  of  L ^  is  avoided.)  The  net 
loss  for  12  is  therefore,  ((Tr  —  Fr  +  1)  —  (Te  —  Fe). 

13.  For  13,  the  applicability  is  the  same  as  12. 
As  in  the  case  of  II,  modification  will  be  restricted  to 
the  value  of  the  nearest  attribute  (i.e.,  A,;).  The  gain 
that  arises  from  applying  13  is  also  Te  —  Fe  as  that 
of  12.  However,  the  calculation  of  the  cost  is  more  in¬ 
volved,  because  a  modified  training  record  will  become 
an  uncertain  evidence  whose  impact  (or,  weight)  will 
be  distributed  among  different  values  (i.e.,  leaf  nodes) 
of  A,;.  The  number  of  changes  (denoted  as  c)  needed 
to  alter  the  associated  class  label  can  be  iteratively  de¬ 
termined  by  3 


C 

Tr  —  c  +  ^  a.j  <  Fr . 

i= 1 

In  this  case,  the  cost  of  13  is  c.  By  putting  together  the 
cost  and  the  gain,  the  net  loss  of  13  can  be  obtained  as 
(c-  (Te  -  Fe)). 


3Let  c  be  the  amount  of  necessary  changes  with  values  at 
A{  being  replaced  by  “?”s.  For  each  modification,  Tr  becomes 
(Tr  —  1  +  Oj),  where  ctj  is  the  fraction  of  the  mass  of  this  modified 
record  that  gets  back  from  re-distribution.  For  the  sign  of  the 
class  label  to  change,  we  want 

C 

Tr  ~  C+y^Qlj  <  Fr, 
i= 1 


with 


O-i+l 


a+  -  i  +  yi':  |  oti 
a+  +  a~  —  1 


3.4  Control  Step 

By  comparing  the  net  losses  of  the  three  approaches, 
we  pick  the  modification  method  with  the  minimum 
loss.  Modification  hides  one  attribute  value  at  a  time 
until  either  the  leaf  node  is  exhausted  or  the  thresh¬ 
old  of  allowed  modifications  of  the  present  classJabel 
is  reached,  where  the  threshold  of  modification  is  deter¬ 
mined  according  to  the  ratio  of  number  of  confidential 
values  that  is  with  this  particular  attribute  (be,,  the 
classJabel)  and  with  other  attributes.  After  modifica¬ 
tion  is  carried  out,  we  compute  E  and  F  and  determine 
whether  or  not  F  exceeds  the  given  threshold.  If  it  does 
not,  our  modification  procedures  will  be  repeated  from 
the  top  level. 

4  DISCUSSION 

Decision  theoretical-based  approaches  to  confidential 
data  protection  have  been  widely  pursued  by  researchers 
from  different  fields.  Exhaustive  evaluation  incurs  ex¬ 
tremely  high  computational  complexity  and  hence,  im¬ 
pedes  the  scalability  of  existing  approaches.  We  pre¬ 
sented  an  adaptive  modification  method  with  a  basis 
in  the  decision  tree  framework.  The  transparency  of 
decision  trees  make  them  an  excellent  tool  for  analyz¬ 
ing  how  specific  data  modifications  may  affect  inference 
possibilities.  Our  adaptive  strategy  selects  and  modi¬ 
fies  the  most  informative  attribute  values,  with  infor¬ 
mation  about  statistical  distribution  obtained  from  de¬ 
cision  tree  analysis,  to  effectively  and  parsimoniously 
handle  the  database  inference  problem.  Furthermore, 
it  localizes  modification  operations  in  a  manner  that 
preserves  database  performance. 

4.1  Complexity 

The  gain  in  computational  complexity  is  obvious.  Let 
M,  N,  S  and  G  denote,  respectively,  the  number  of 
attributes,  data  records,  confidential  attribute  values, 
and  modified  attribute  values  that  are  sufficient  for 
data  protection.  In  the  (batch)  exhaustive  evaluation, 
the  complexity  is  the  combinatorial  G(^MNq~s)a  ,  while 
in  our  approach,  it  is  the  polynomial  order  of  M'2S.  In 
fact,  because  exhaustive  search  involves  large  number 
of  repeated  tree  generation,  it  becomes  unpractically 

4If  the  number  of  attribute  values  to  be  modified  is  not  known 
a  priori,  different  values  of  G  will  be  tested  until  performance 
bound  v  is  met  in  the  average  sense. 


expensive  to  use  for  even  a  small  relational  table  of  the 
dimension  of  N  =  20  and  M  =  5  with  v= 25%.  With 
our  proposed  method,  we  are  able  to  obtain  satisfactory 
results  in  terms  of  performance  and  protection. 

We  are  presently  experimenting  with  some  data  sets 
from  UCI  repositories  (e.g.,  [9])  and  will  test  various 
KDD  databases.  We  have  tested  methods  of  exhaustive 
search,  single-attribute- valued  best-first  search5,  and 
our  informative  modification. 

4.2  Evaluation 

As  mentioned,  our  evaluation  of  confidential  data  pro¬ 
tection  is  based  on  the  average  of  the  classification  er¬ 
ror  of  the  test  data  (i.e.,  the  modified  non-confidential 
data)  with  respect  to  each  classJabel.  We  justify  the 
proposed  approach  by  comparing  the  results  with  those 
obtained  from  a  best-first  search.  With  the  well-known 
voting  records  dataset  ([9]),  the  proposed  approach  se¬ 
lects  and  modifies  attribute  values  of  test  records  with 
the  modification  method  II  being  chosen.  The  results 
of  modification  is  close  to  those  obtained  from  the  best- 
first  search.  For  instance,  with  20  confidential  attribute 
values,  in  the  best-first  search  the  classification  error 
increases  from  5.26%  to  57.90%  with  11  modifications. 
The  proposed  adaptive  modification  selects  9,  that  are 
part  of  the  11,  modifications.  The  modification  method 
12  is  likely  to  be  selected  in  the  case  that  a  leaf  node 
is  associated  with  a  small  number  of  training  records, 
but  a  large  number  of  test  data  records.  The  adaptive 
modification  was  motivated  by  our  experiments  with 
best-first  search  and  thus,  the  performance  of  our  pro¬ 
posed  selection  strategy  is  expected  to  be  very  close  to 
the  performance  of  the  best-first  strategy  on  the  vot¬ 
ing  record  data.  In  addition  to  the  reduction  of  com¬ 
putational  complexity,  the  adaptive  modification  avoid 
unnecessary  modifications,  which  can  be  a  large  quan¬ 
tity,  at  the  beginning  and  towards  the  end  of  selection 
that  the  best-first  strategy  may  face  (due  to  the  inef¬ 
fectiveness  of  the  data  selection  criterion  under  certain 
conditions.) 

5In  this  approach,  for  each  attribute  value,  we  estimate  its 
impact  on  the  average  classification  error.  After  evaluating  all 
attribute  values,  we  hide  the  attribute  value  with  the  maximum 
impact  and  update  decision  tree.  Then,  we  resume  the  next 
round  selection. 


4.3  Restoration 

An  attacker  may  know  our  inference  prevention  strat¬ 
egy.  As  a  result,  modified  attribute  values  can  be  re¬ 
stored  and  hence,  confidential  data  are  not  correctly 
protected.  We  perform  experiments  in  which  the  re¬ 
maining  unhidden  attribute  values  were  used  to  in¬ 
fer  the  attribute  values  that  had  been  hidden  in  our 
first  experiment.  In  term  of  voting  data,  the  result 
shows  that  the  previously  hidden  attribute  values  (e.g., 
“physician  fee  freeze” )  are  restored  by  using  some  other 
attributes  (e.g.,  “El  Salvador  aid”).  In  the  wake  of  the 
possibility  of  modified  values  being  restored,  we  repeat 
the  process  of  attribute  value  hiding,  making  previ¬ 
ously  hidden  attributes  confidential,  until  restoration 
risk  goes  below  a  specified  threshold.  The  need  of  re¬ 
peated  hiding  (referred  to  as  the  ramification  problem 
of  database  inference  [2])  presents  a  challenge  to  the 
value  suppression  (e.g.,  blocking)  modification  strat¬ 
egy.  We  will  explore  the  restoration  and  other  types  of 
attacks  in  our  future  work. 

5  FUTURE  WORK 

Our  present  blocking-based  modification  may  not  be 
the  most  effective  means  of  modifying  data.  We  will 
also  experiment  with  perturbation  method.  We  have 
not  yet  discussed  the  value-restoration  problem,  in  which 
an  attacker  might  restore  blocked  values  in  the  same 
manner  that  he  restores  confidential  data.  Also,  we  did 
not  discuss  particular  numerical  values  which  might  be 
assigned  to  the  tolerance  level.  We  leave  these  issues 
as  part  of  our  future  work. 

Our  evaluation  of  the  proposed  method  is  based  on 
empirical  study  by  comparing  it  with  different  decision 
tree  approaches.  We  will  evaluate  it  against  other  ex¬ 
isting  method  based  on  different  frameworks  for  micro¬ 
data  suppression. 
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